motivating case study
game theoretic motivation
translation to ML
visualization
Enhancers are stretches of the genome that coordinate the expression of many downstream genes.
Understanding these interactions explains genotype \(\to\) phenotype.
This can be studied using developing fruit flies.
(source)
Which features drive enhancer activity in each sequence?
Data
Genome-wide measurements from blastoderm (stage 5) Drosophila embryos:
Each observation: a genomic sequence with associated regulatory features.
Enhancer status: \(y \in \{0,1\}\)
Predictors: \(x = (x_1, \ldots, x_D)\) (TF binding intensities, chromatin signals)
Model: \(f(x_i) = \mathbb{P}(y_i = 1 \mid x_i)\)
Goal. Quantify each feature’s contribution to \(f(x_i)\).
How to distribute profit across employees \(i\) in a company if any team \(S\) has profit \(v(S)\)?
Game theoretic analogy.
Employee \(i\)’s credit: average marginal contribution \(v(S) - v(S \setminus \{i\})\) over all teams \(S \ni i\).
\[\begin{align} \varphi(i) = \frac{1}{D} \sum_{d = 1}^{D} \frac{1}{\binom{D-1}{d-1}}\sum_{S \in S_{d}(i)} [v(S) - v(S \setminus \{i\})] \end{align}\]
where \(S_{d}(i)\) collects subsets of size \(d\) containing \(i\).
Shapley values are the unique credit assignment satisfying:
Symmetry. Equal marginal contributions \(\to\) equal credit. \[\begin{align*} \varphi(i) = \varphi(j) \;\text{ if }\; v(S \cup \{i\}) - v(S) = v(S \cup \{j\}) - v(S) \;\; \forall S \end{align*}\]
Dummy. Zero marginal contribution \(\to\) zero credit. \[\begin{align*} \varphi(i) = 0 \;\text{ if }\; v(S \cup \{i\}) = v(S) \;\; \forall S \end{align*}\]
Additivity. For two games \(v_1, v_2\), attributions add. \[\begin{align*} \varphi_i(v_1 + v_2) = \varphi_i(v_1) + \varphi_i(v_2) \end{align*}\]
Efficiency. All credit is distributed, and no double counting.
\[\sum_{i=1}^D \varphi(i) = v(D) - v(\emptyset)\]
For each prediction \(f(x)\), define a game:
\[\begin{align} v_{x}(S) = \mathbb{E}_{p(x'_{S^C} \mid x_{S})}[f(x_{S}, x'_{S^C})] \end{align}\]
where \(x_S\) denotes coordinates in \(S\) fixed at their observed values, and \(x'_{S^C}\) are the remaining coordinates drawn from their conditional distribution.
How to define \(v(S)\)? This determines what “importance” means.
Feature \(i\)’s contribution to \(f(x)\):
\[\begin{align} \varphi_{x}(f, i) = \frac{1}{D} \sum_{d = 1}^{D} \frac{1}{\binom{D-1}{d-1}}\sum_{S \in S_{d}(i)}[v_{x}(S) - v_{x}(S \setminus \{i\})] \end{align}\]
These satisfy: \[\sum_{i=1}^D \varphi_{x}(f, i) = f(x) - \mathbb{E}[f(X)]\]
Each \(\varphi_x(f,i)\) explains deviation from baseline prediction.
\[\begin{align} v_{x}(S) = \mathbb{E}_{p(x'_{S^C} \mid x_{S})}[f(x_{S}, x'_{S^C})] \end{align}\]
.center[ ]
Conditional expectation when features \(S\) are fixed at \(x_S\).
Respond to [SHAP Visual Explanation] in the exercise sheet.
The conditional expectation \(\mathbb{E}[f(x_S, X_{S^C}) \mid x_S]\) is generally not computable.
This forces a choice:
We discuss this further next week.
Attributions sum to \(f(x)\): visualize as stacked bar centered at prediction.
Compact visualization identifies samples with similar explanations.